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[uTLCtiona  witli  special  emphasis  op  ill*condHioried  or  nonamooth  funcliona.  Weighted 
least  squares  aaalysis  is  used  to  come  up  with  a new  bundle  slategy  iu  generatiog 
directions  that  provide  descent  with  respect  to  nearby  subgradients.  These  directions 
combine  affine  scaling  and  modified  Newton  eeotering  on  level  set  approximations  of 

central  cutting  plane  methods  is  derived.  Global  convergence  of  the  resulting  algo* 
rithm  is  proved  for  the  case  when  the  objective  function  is  smooth.  A special  case  of 
the  algorithm  reduces  to  a conjugate  gradient  method.  Computational  results  coin- 

schemes  are  presented. 


CHAPTER  I 
INTRODUCTION 


l.l  Overview 


rilhms  conttist  of  two  imporluit  sUps: 


(a)  fiiKling  a direction  of  search  it,  and 


use  of  weighted  least  squares  (WLS)  the- 


ory in  Boding  the  search  direction.  Although  WLS  solutions  find  their  way  in  some 
mathematical  programming  algorithms,  they  are  usually  incidental  to  the  method 
and  often  appear  after  the  fact  (see  for  eaample  Slone  and  Tovey  |49]  which  casts 
tlic  simplex  and  projective  algorithciis  as  weighted  least  squares  methods,  and  Todd, 
[,10]  which  relates  the  ellipsoidal  and  projective  methods  through  the  weighted  least 
squares  subproblcms  embedded  in  each  algorithm).  In  this  dissertation,  we  system- 
atically  use  WLS  theory  as  a strategy  for  finding  the.scarch  direction.  We  expect  to 
show  how  WLS  theory  provides  a way  of  integrating  and  balancing  often  conflicting 
conditions  that  an  acccptahlc  search  direction  is  required  to  satisfy. 

To  illustrate  this  approach,  let  y,,t  ^ /,  be  a set  of  vectors.  Some  of  these  vectors 
may  represent  gradients  of  linear  constraints  (hyperplaoes  and  haifspaces)  while  some 
msy  be  gradients  or  subgradiunts  of  an  objective  function.  Others  may  represent 
directions  which  in  one  way  or  another  are  found  to  be  relevant  to  the  problem  at 
hand.  A search  directioo  d may  then  be  required  to  satisfy  any  of  the  following: 

1.  The  direction  d must  be  a direction  parallel  to  (as  in  gradient  projection),  away 
from  (as  in  centering),  or  toward  a collection  of  conslraiiits. 


io  partirular). 

3.  d must  be  Dearly  eolilDear  with  some  gradients  and  ortbogoDsI  or  almost  or* 
tbogonaJ  to  rrtbers  (Shor’s  space  dilation  method,  ellipeoid  metbods], 

4.  d must  satisfy  eombioations  of  the  above,  with  some  renditions  absolute,  and 
others  with  varying  degrees  of  necessity. 

In  short,  the  direction  buding  subproblem  liDds  a direction  d satisfying  any  of  the 
following  according  to  varying  degrees  {weights]  of  reaJiaation: 
gjd  > 0.ie/*C/ 

gjd  = 0,i  € /”  C / 

gfd  < 0.i€/‘c;. 

This  research  is  concerned  with  using  weighted  least  squares  theory  in  solving  tbc 
direction  buding  subproblem  above.  The  general  idea  is  as  follows.  Suppose  we  can 
quantify  the  degrees  of  realisation  for  the  requirements  on  tbc  direction  d with  r^pect 
to  the  vectors  j„i  e / by  the  positive  weights  u>„  The  larger  the  weight  the  stronger 
the  enforcement  of  the  requirement  must  be.  A solution  to  the  direction  finding 
subproblem  is  then  a vector  d solving  the  linear  system 

'‘’•{gJd  = *H).ie/*  (1.1) 

'‘AtJi  = 0),i€/”  (U) 

■‘■.(S,’’d  = -1),>6/-  (1.3) 

Depending  on  the  information  available  for  any  particular  problem,  the  system  (1.1) 
to  (1.3)  can  be  generalUcd  by  chanpng  the  +1  in  (1.1)  to  some  o;  > I)  and  the  -1 
in  (1.3)  to  some  ct,  < 0.  Note  that  in  most  cases,  the  above  conditions  are  confiicting 
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course  & waglited  least  squares  solution.  It  is  well  known  tbnt  n large  weiglil  on  an 
equation  tends  to  make  the  wt  iglii'd  least  squares  solution  more  nearly  satisfy  that 
equation.  In  addition,  constrained  WLS  theory  provides  a way  of  imposing  conditions 

strongly  they  are  to  be  imposed,  and  how  to  quantify  these  into  fixed  weights  will 
depend  on  the  problem  class  being  considered.  It  is  the  object  of  this  research  to 
answer  these  questions  for  a variety  of  matbcntatical  prograiiuning  problems.  This 
dissertation  Is  structured  in  a manner  similar  to  tevenic  engineering.  We  analyse 
certain  directions  that  ate  being  used  in  practice  from  the  point  of  view  of  WLS 
theory.  This  analysis  ^ves  insights  as  to  the  directions'  strengths  and  weaknesses. 
More  impoiyantly,  the  analysis  provides  clues  as  to  how  to  apply  the  WLS  technique 
in  coming  up  with  new  directions  with  more  desirable  properties. 

We  begin  in  the  following  section  with  a brief  review  of  WLS  theory.  Emphasis  will 
be  giveo  to  properties  that  we  shall  need  in  undertaking  the  above  WLS  approach. 

Chapter  2 applies  WLS  direction  finding  to  the  analytic  center  problem.  We  first 
gencralue  the  concept  of  Sonnevend's  [47]  analytic  center  to  include  centers  arising 
from  a family  of  logarithmic  and  Inverse  barrier  functions.  Descent  directions  for  these 
barrier  functions  ate  considered  to  be  centering  directions.  The  Newton  direction  for 
these  functions  are  then  formulated  (reverse  engineered)  as  a solution  to  a WLS 
problem.  Similar  to  a result  of  Meggido  and  Shub  [38),  this  analysis  shows  that  the 
Newton  centering  direction  does  not  behave  properly  near  the  boundary,  lending  to 
be  parallel  to  nearby  constraints  whereas  the  objective  of  centering  is  to  move  away 
from  nearby  constraints.  This  analysis  in  turn  Icsds  to  a modified  Newton  centering 
direction  which  has  a hettcc  boundary  behavior  than  the  Newton  centering  direction. 


This  direclioQ  wij)  pisy  sn  impoituit  role  b Applying  the  WLS  technique  Urlheolher 
classes  of  mathematical  programmiog  problcios  to  be  considered  below. 

Chapter  3 deals  with  linear  programming  wliere  the  dual  affiae  scaling  (DAS) 

alEue  scaling  algorithm  and  the  need  for  a ceoleriog  component  especially  when 
the  iterates  arc  close  to  the  boundary.  The  chapter  then  provides  an  algorithmic 
framework  for  comhiniug  the  DAS  direction  with  auy  centering  directioo  in  order 
to  overcome  dihlctilties  associated  with  very  bad  initial  iterates.  This  algorithmic 
framework  is  used  in  presenting  computational  results  that  compare  the  eOecls  of 

indicate  that  the  modihed  Newton  direction  is  a mote  suitable  ceoterbg  compoueot 
to  the  DAS  directioo  for  practical  ioterior  point  implementations. 

Chapter  4 syothroizes  the  techniques  developvd  thus  far  into  a bundle-descent 
algorithm  for  minimising  nonsmooth  functions.  A search  direction  that  provides 
descent  with  respect  to  nearby  subgradienls  is  developed  from  the  centering  properties 
of  the  modified  Newton  centering  direction.  The  WLS  direction  finding  subproblem 
uses  affine  scaling  and  modified  Newton  centering  on  level  set  approximations  of  the 
objective  function.  Truditional  bundle  methods  use  a direction  which  approaches 
the  steepest  rlescenl  rfirection  when  the  ( parameter  approaches  zero  making  them 
susceptible  to  zigzagging.  In  contrast,  the  WLS  descent  direction  avoids  zigzagging 
by  always  taking  into  account  nearby  gradients.  We  prove  global  convergence  of  the 
algorithm  for  the  case  when  tbs  objective  Function  is  smooth.  We  also  show  that  a 
special  case  of  the  algorithm  rediires  to  a conjugate  gradient  method.  Relationships  of 
the  algorithm  to  central  cutting  plane  methods  are  explored.  The  chapter  concludes 


FiDally,  Chapter  S gives  a summary  and  czpicpres  exlenuons  for  further  research. 


Notalioa.  Throughout  the  paper  weshall  usethefollowiag  Dotation.  Foragiveo 

II  X II  denotes  the  EucUdeso  norm.  We  will  at  times  use  the  symbol  {x,  y)  to  denote 
llie  Inner  product  The  weighted  least  squares  problem  miiif  ||  W(Ax  - h)  |[^  will 
be  denoted  by  W(/Ur  a 6)  or  Wdz  = W6. 


Let  A e R"'".h  € R",  and  ranh(rt)  = n.  The  least  squares  problem  denoted 
by  di  a h finds  the  vector  z € R"  which  minimizes  ||  dz  — h ||^  The  solution  to 
dz  a h is  ^ven  by  the  normal  equations 


Let  W = diog(u)i,  where  tn,  > 0 Vi.  The  weighted  least  squares  problem 

Wdx  a Wh  or  W(dx  = A)  is  solved  by 

r = iA'W‘A)-'A'^W^h.  (1.4) 

If  ut,  is  large,  the  tendency  is  to  make  the  rtsidital  r,  s A;  - afx  small,  i.c.,  a large 
weight  on  the  its  equation  bJx  — A,  tends  to  make  the  least  squares  solution  satisfy 
the  i,s  equation  (Goitih  and  Van  Loan  (19)).  This  property  is  the  key  to  solving  the 
system  (1.1)  to  (1.3). 

Consider  now  the  least  squares  problem  subject  to  equality  constraints; 

Min  II  dz- A 113 


Rx  = l 


(1.5) 


nlc(i?)  s ni]  < n.  Tbe  fbUoniDg  tbe 


proofs,  see  G"bib  end  Van  Loan  [13|,  Lawson  and  Hanson  [28]],  and  Farebrolher  [4], 
Tbeorrm  1 CoD.sjder  the  H'LS  sofutioo  i(s)  to 


Let  X be  the  mJuUoa  to  tbe  eoastrejfted  feast  squares  probte/n  tbco  x(e)  -•  h 


Com/Jan- 1 Consider  the  H'LS  solution  i(io)  to 


Then  x(te)  — • i as  le  -•  eo. 

Note  that  tbe  tbeorem  and  corollary  are  cqnisalent  with  ie  = 1/c, 

Based  on  these,  a constraint  may  be  imposed  by  assigning  it  with  a large  enou^ 
weight  or  by  downwelglitlng  the  other  equations  with  a small  enough  weight.  Thus, 
this  provides  a way  of  imposing  conditions  that  have  to  be  absolutely  satisfied  by 
a direetion  boding  subproblem.  These  results  will  he  used  in  solving  the  direction 
finding  subproblem  (l,i-l,3). 


CHAPTER  2 

ANALYTIC  CENTERING 


2,1  The  D-centef 

AnAlytic  centers  (Sonnt*VHiid  [M,  48])  based  on  tbe  iDgarithoiic  barrier  [unclioL 

of  barrier  funclions  based  on  tbe  logarilbmic  acd  iaverse-power  barrier  fuoctioQs. 
These  barrier  tunctions  have  been  used  ia  the  contest  ot  barrier  methods  for  non- 
linear programming  (see  Frisch  [7]  and  Fiacco  and  McCormick  [5]  among  others). 
Consider  the  fearible  polyhedral  set  of  the  linear  program  (LD): 

F,  = {i:  /ll<6) 

= {a:o!i<4„i  = l,2 m}. 

The  p-barrirr  fanetioa  over  int(F*)  = (i ; Ai<  i)  is  defined  as  the  function 
/sW=  z:r.,-i"(4.-tt;t),  p=o 
/.(*)“  £SiJ(4.-e;r)-',  p>0.  (2,1) 

where  p is  the  peter r of  the  barrier  function. 

The  gradient  of  tbe  p-barricr  function  is 

V/,(i)  = A’'S-f«'le,  (2.2) 


VV,(x)  = (p+l)A’'5-'”’'A. 

The  Hessian  can  easily  be  shown  to  be  positive  definite  and  hence  the  harrier  functions 
defined  above  are  strictly  convex.  As  the  value  of  these  barrier  fuoctions  tend  to 


m of  /,(!)  over  int(/V).  1 


olF,,  U., 


aji  = t, 

„fi  < ij.jjii. 

I«{x‘}^i  be;>s«,ueLceapoi:,u,-nlbeinl(F,)co^yergiagloi.  ForalU  = 
1.2 X*  = <(x‘)  = i-Az‘,  <u,d  lol  <?<(x‘)  bo  tie  ^e«.|oo  p-ceeteri«*  direct, bt, 


(2.4) 


ltae?=a,  = 0 
liin.;  = ii,  > O.VJ^i. 


Mia  ||(S*)-l^’'«(/»i  + S‘.)||» 


lin.(a‘)-'-»/>(.ri  sr  -Itaaf] 


aU  = 0. 


Figures.!.  Newton  trajectories  from  boundary 


The  !ie>-  to  the  proof  of  the  above  theorem  is  the  WLS  formulation  of  the  New. 
ion  direclion  given  by  equations  (2.5)  to  (2.7).  Consider  now  tbe  modified  Newton 


This  is  the  solution  to  tbe  WLS  problem 

S -e|.  or  (2.9) 

S -1|,  .'=1,2 m. 

(Compare  with  (2.7)).  This  given  a negative  inner  product  hounded  away  from  aero 
with  the  gradients  of  the  nearest  constraints  thus  giving  movement  away  from  these 
constraints.  In  addition,  this  would  keep  the  rale  of  movement  toward  intermediate 


«?i  = A, 

-Ji  < i- 

L,t  {i‘}£,  be  i sequence  olp^m  in  theinUF.)  converging  lo  i.  Foe  nil  t = 
1,2 kt  = .(I*)  = b- /ti‘,  end  let  be  lie  modified  \ewloo  p-«Dlering 


..LnU^-l 


Figure  2Ji.  Modified  Newloo  trajecUiries  from  boundary 


near  enough  to  the  center.  As  we  have  seen,  however,  it  behaves  poorlv  near  the 
boundary.  The  modified-Newton  direction  on  the  other  hand  is  an  excellent  centering 
direction  from  the  boundary,  h does  not,  however,  possess  the  quadrat  ice  convergence 
property  of  Newton's  method.  In  the  next  two  chapters,  we  shall  see  the  need  for  a 
direction  that  centers  very  well  when  near  the  boundary,  and  hence,  we  shall  have 
occasion  to  use  this  direction  and  lest  its  merits  in  practice. 


CHAPTERS 

LINEAR  PROCiRAMMlNG 
3.1  ImroducUoii 

lolcresl  in  interior  point  algorittmir  far  liin'ar  programming  probiemn  baa  grown 
considerably  since  Karmorkar’s  pioneering  report  on  the  polynomial  complexity  of  Ills 
projective  algorithm  [21].  The  alfiiie  scaling  algorithm  developed  by  DikIn  [3]  (see also 
Barnes  [2]  and  Vanderbei  et  al.  (Mj)  and  implemented  in  dual  form  by  Adler  et  al.  [1] 
and  Monma  and  Morton  [40]  was  one  of  the  lint  interior  point  ^oritbras  that  have 
been  shown  to  be  competitive  with  the  simplex  method  in  solving  linear  programming 
problems.  The  long-standing  question  of  global  convergence  of  the  long-step  affine 
scaling  algorithm  for  degenerate  linear  programs  has  just  recently  been  settled  by 
Tsuchiya  and  Muramatsu  [53]  who  established  convergence  given  a step-size  of  a 
fraction  -y  € (0,2/3|  of  the  way  to  the  boundary  (see  also  Montelm  ct  al.  [41]  for  a 
shorter  proof).  Global  convergence  for  primal  nondegenerate  linttnr  programs  given  a 
step-size  of  any  fraction  7 € (5, 1)  has  been  studied  by  several  authors  (Vanderbei  cl 
al.  [54].  Gonsaga  [15]). 

Although  global  convergence  has  been  settled,  the  question  of  robustness  of  tbe 
affine  scaling  algorithm  is  still  a concern.  Meggido  and  Shub  [38]  have  shown  that  the 
affine  scaling  direction  lends  to  move  parallel  to  nearby  constraints.  This  can  result 
In  very  slow  convergence,  even  numerical  stalling  if  the  initial  iterate  is  too  close 
to  tbe  boundary  or  if  the  step-size  taken  is  loo  long.  This  has  led  Karraarkar  and 
Ramakrislinan  [22]  to  conclude  that  ‘•affine  scaling  can  often  get  stuck  close  to  tbe 
boundaries  and  we  do  not  cocommenti  it  as  a practical  robust  algorithm'  (page  556). 


One  wny  of  ovptcomiQg  Ibis  diflicully  is  lo  incorporate  n nuiUble  centering  component 

been  proposed  in  theory  or  implemented  in  practice  use  search  directions  which  are 
combinations  of  an  afTine  scaling  direction  and  a centering  direction  based  on  the 
Newton  direction  for  the  logarithmic  barrier  function  (sec  Gonsaga  [16j  and  Den 
Hertog  and  Rocs  [20]  for  surveys  of  search  directions  with  this  property}.  However, 
the  issue  of  robustness  is  still  unresolved  and  the  choice  of  storting  point  is  still  very 
important  [Sbonno  (44j),  and  this  has  led  Gnler  et  al.  [17]  to  remark  that  “a  bod 
initiol  point  (^,s^)  [mconing  close  to  the  boundory)  can  cause  immediate  problems 
for  any  1PM  ]interjor  point  method]''  (page  II],  This  may  be  attributed  to  tile  fact 
that,  paradordcally,  the  Newton  direction  designed  to  center  iterates  away  from  the 
bouodary  shares  the  same  property  of  the  affine  sealing  direction  of  tending  to  move 
parallel  to  nearby  constrmnls  if  initiated  near  the  boundary  (recall  Theorem  2 of 
Chapter  2). 

This  chapter  uses  weighted  least  squares  theory  in  analyzing  problems  of  nonro* 
bustness  of  interior  point  methods  associated  with  the  poor  ocar-boundary  behavior 
of  the  search  directions.  It  shows  how  this  difficulty  may  be  resolved  by  using  the 
modified  Newton  centering  direction  developed  in  the  preceding  chapter. 

3.2  The  Dual  Affine  Sralins  Direction  as  a Weiahted  Least  Squares  Solution 
Consider  the  linear  programming  problem  in  dual  standard  form: 

(LD]  c-j 

s.l.  Ai<k 

where  ,4  is  an  m x n matrix  of  full  column  rank,  with  m > n and  c j!  0.  The  slack 
variables  of  the  problem  are  given  by  s = > 0.  Assume  that  the  (easibic  region 

= {s:s  = 6—  Ax>0]of(LD)ia  bounded  and  lias  a non*enipty  interior,  bet 
be  an  interior  feasible  point  with  corresponding  slack  vectors"®  6-  Ax' > (I. 


Fur  any  iiitcnoi  fejuiMc  point  z with  sinck  s > D 
in  defined  lu  be 

d,  = {A’'3-’A)-'c.  (3.1) 

Starting  witb  llin  dual  afline  scaling  algorithin  (Adler,  et  al.  (ij,  Monma 
and  Morton  [40])  solvea  (LD)  by  generating  a sequence  of  interior  feasible  points 
{i‘ I*,...}  defined  by; 

vthcre  is  the  DAS  direction  al  i‘,  is  the  standard  ratio  lest  slep-siac  which 
would  bring  the  next  iterate  to  the  boundary  and  7 € (0, 1]  is  the  safely  factor  which 
ensures  that  the  next  iterate  remains  interior  feasible.  Note  that  since  A is  of  full 
rank,  and  s*  > 0 for  all  k,  the  matrix  (A’'S*"'A}"'  is  symmetric  positive  definite  and 
hence,  d,>  is  an  ascent  directiun.  The  following  theorem  interprets  this  direction  as  a 
weighted  least  squares  solution  and  provides  insights  on  its  properties  as  an  interior 
point  search  direction. 


Theorem  4 Let  z he  an  interior  feaztble  point  with  corresponding  alack  vectoca>  h. 
Then  Vp  > 0,  the  soiution  d(p)  to  the  conatraiaed  WX5  .system 


(WLS(p))  Mill  l||S-i(Ad-fl)||'  (3.2) 

s.t.  c^d=>p. 


isgivea  by 


where  dpaa  ~ (A^5“^A)“‘c  is  the  DAS  direction.  Coaveraoty,  any  po.siljve  scaie  of 
the  DAS  direction  is  a solution  to  tVtSfp)  for  some  p > 0. 


Proof.  Tbe  Kanish-Kubo-Tuckor  coodilioos  for  WLS(^)  is  givco  by 


= 0 


r’’<f  = p 


The  rest  of  the  proof  follow  from  = p ood  Ibc  debnifion  of  doAS  ('d.l).  ^ 

The  preceding  theorem  implies  that  of  all  the  possible  ascent  dlrectioos  d (with 
c^d  = p for  any  p > 0).  the  DAS  direction  is  the  one  which  is  as  close  as  possible  (in 
a weighted  least  squares  sense)  to  the  null  space  of  A,  with  the  nearest  construots 
haviog  the  largest  weights.  Denoting  by  aj  tbc  ith  row  of  A,  we  see  that  the  WLb 

a,-'(ord3  0),  i=  l,...,irr  (3.4) 

s.t.  c’-d=p. 

From  WhS  theory,  the  solution  d would  tend  to  satisfy  more  closely  the  equations  in 
the  least  squares  system  with  the  highest  wdgbts,  io  this  case  those  with  the  smallest 
slack  variables.  Meggido  and  Sbub's  result  on  the  boundary  behavior  of  the  DAS 
direction  itow  follows  directly  from  this  analysis;  as  x approaches  say  the  yth  facet, 
then  the  weight  1/s,  of  the  ;lh  WL5  equation  in  (3.4)  tends  to  infinity  and  the  inner 
product  sjdpas  tends  to  zero,  i.e.,  the  directioo  would  tend  to  be  parallel  to  the 

Thus,  the  pure  dual  affine  scaling  algorithm  is  not  robust  as  it  could  get  .stuck 
in  the  presence  of  a lew  slack  variables  which  domioate  the  above  WLS  system. 
On  the  other  hand,  if  there  is  no  slack  that  dominates  the  WLS  system,  the  DAS 
direction  would  tend  to  have  small  inner  producU  with  most  of  the  nearby  and  even 
intermediate  constraints  thus  allowing  for  asnbstantiai  inercaM  io  the  objective  before 
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il  hits  &n  opposing  coDSlrsinl.  Moroovor,  il  is  pear  a facet  that  is  binding  at  the 
oplimaj  solution  afline  scaJing  would  work  very  well  since  the  only  thing  that  will 

the  pure  alfinc  scaling  direction  at  the  tail  Iterations.  Furthermore,  Tsuchiya  and 
Muramatsu  [53]  have  shown  that  for  a step-size  of  a fraction  7 € (0, 2/3)  of  the  way 
to  the  boundary,  the  asymptotic  Q-linear  rate  of  convergence  of  the  affine  scaling 

What  is  needed  then  is  a search  direction  that  has  the  DAS  direction's  strengths 

appropriate  centering  direction. 

3.3  An  Aleorilbinic  Framework  for  Affine  Scalini  with  Centeriai 
The  Newton  direction 

do"  = (3-5) 

d«  = -(a''S-”A)-M’'S->c  (3.6) 

for  the  inverse  barrier  function  are  centering  directions  that  can  Ire  combined  natu- 
rally with  the  DAS  direction  as  they  use  the  same  symmetric  positive  definite  matrix 
(the  Hessian  of  the  log  barrier  fuoclion)  in  their  dehnitions.  Hence,  no  extra  fac- 
torization will  be  neceasary  to  compute  either  directions  once  the  DAS  direction  has 

these  directions. 

A IbeorelicaJ  justiheation  for  combiniog  the  DAS  direction  with  the  Newton  cen- 
tering direction  can  be  derived  via  a Newton  log  barrier  tcchaique.  Consider  the 


(fo(hs)) 


c'x-wWz) 


/tz<i 


where  /o(x)  is  the  logarithmic  harrier  fiinctiou.  Tlie  Newtoo  log  barrier  techoique 
solves /h(ht)  approximately  for  a sequeoce  of  barrier  parameters  {^s}  whicll  approadi 
zero  in  Ibc  limit.  It  can  be  shown  that  the  Newton  direction  for  the  above  problem 
is  given  by 

dn  = doAs  + zr*^ 


a direction  that  isa  contbiiiation  of  an  affine  scaling  direction  and  the  Newton  center- 
ing direction  for  the  logarithmic  barrier  function  ([16],  [20]).  These  algorithms  dilTcr 
mostly  on  how  the  barrier  parameter  ps  Is  reduced  and  oo  the  length  of  the  step-siae 
taken  along  the  search  direction.  Theoretical  polynomial  convergence  (the  best  being 
0{^/mt)  iterations  to  get  within  2~'  of  the  optimaJ  objective  value]  can  be  established 

at  least  a subsequence  of  the  generated  iterates  close  to  the  certtral  trajectory  (the 
locus  of  exact  optimal  solutions  to  fh(/ta)  as  ps  ranges  from  oo  to  zero).  However, 
adhering  to  these  restrictions  in  practice  results  in  slow  convergence— the  number  of 
iterations  usnally  approaches  the  theoretical  worst  case  bounds.  Thus,  most  practi* 

barrier  parameter  and  a step-size  of  a fraction  i of  the  way  to  the  boundary  where  o 
is  close  to  one. 

The  merits  of  the  Newton  centering  directiort  lie  in  the  local  quadratic  couver- 
gcnce  of  .Newton's  method.  This  property  is  exploited  in  theoretical  potyuomial  aJ- 
gorithms  by  ensuring  that  the  generated  iterates  art  never  loo  far  from  the  central 
trajectory.  Practical  algorithms  however,  use  step-sises  that  take  the  iterates  close 
to  the  boundary,  i.e..  away  from  the  central  trajectory.  From  a purely  pragmatic 


«i  = ,5*10-‘. 
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To  ensure  globe)  convergence  willioul  any  dogcnerncy  assumptions,  we  follow  the 
prescriptioD  ofTsuchiyn  and  Murnnialsu  (33).  They  suggested  that  ft  switch  )>c  made 
at  the  Rna)  iterations  to  Lbe  pure  afGiie  scaling  algorithm  with  a stepsize  of  2/3 
of  the  way  to  llic  boundary.  In  the  algorithm  below,  we  realce  the  switch  when  lbe 

|c^t‘-c’’x*-’|  < .1 . max{l,|cV-'I).  (3.8) 

and  the  dual  eftlimatos.  Given  the  interior  Iterate  a*  and  corresponding  slack  s*,  a 
tentative  dual  solution  (Adler  et  al.  )!])  is  given  by 

y‘  = (3.9) 

= S*''Adoas 

To  ensure  optimality,  it  is  enough  Co  iiave 


b'sJ  = 0,j  = l,2 m. 

The  algoritlun  can  then  be  terminated  when  the  minimum  normalised  dual  entry 


™nK/  II  »*  Ih/  = 1,2 ">}  > -tj.  (3.10) 

and  the  maximum  normalised  complementarity  violation 

n'«(l»Wl/l|y''llll'‘l|:/  = 1.2 rr.}<c,  (3.11) 

|6V-cV|/|i,^y*|<<;,  (3.12) 


for  pven  amali  positive  tolerances  e,,cj,  and  r.,. 


The  &lg<>rilhmir  framework  can  now  be  presented  as  follows.  The  default  values 
of  the  parameters  used  In  the  coniputaltonal  results  in  the  next  serlioo  are  given  in 

An  algorithmic  framework  for  dual  afhne  scaling  srlth  centering 

a Step  1:  Let  r^  be  an  interior  feasible  pcunl.  Let  dcnu(z)  be  a function  which 
returns  a centering  direction  from  any  interior  pwnt  a.  Let  C|(=  le  - d),<r(= 
le  9).cs  — (fe  — 9).  and  e^fs  le  — 8)  be  the  tolerances.  Let  y(ss  .99)  be  the 
initial  step-siae  parameter.  Set  k = 0. 

• Step  2:  Compute  the  search  direction  using  (3.7). 

a Step  3:  if  Adu  2 9 return(*unboiinded']  endif 

s Step  4:  1^1  r**'  = i*  + 'rcr.mrda!(z*)  where 

Oma  = inin{s‘/oJ'dw  : oJ'd^>  D,;  = l,2,...,m} 

is  the  maximum  feasible  step  In  the  direction  dw. 
if  the  switching  criterion  (3.8)  is  met 

set  di„i(r)  = OVi  (switch  to  a puce  DAS  directinn  henceforth) 
set  7 = .fififie  (use  2/3  step-siae  liencefortb) 

• Step  C:  Compute  dual  estimates  according  to  (3.9). 
a Step  7i 

if  the  stopping  criteria  (3.10),  (3.11),  and  (3.12)  are  not  met 


Usiag  diTTti  - tte  Newtoo  log  barrier  centering  direction,  in  the  algorithmic 
framework  of  the  preceding  section  leads  to  a DAS  with  Newton  centering  (DAS  w/ 

direction,  leads  to  a DAS  with  modiScd'Newlon  centering  (DAS  w/  M*NC)  algorithm. 
In  this  section,  we  repo)  I Ihcresuilnor  romputational  enperimeols  designed  to  test  the 
robustness  of  the  two  algorithms.  As  a measure  of  robustness,  wc  use  the  number  of 
iterations  the  test  algorithms  take  in  finding  s.optimal  solutions  when  the  algorithms 
are  started  from  “bad'*  iterates,  i.e.,  points  close  to  the  boundary  which  would  likely 
cause  problems  for  a pure  aHtne  scaling  approach.  We  also  report  the  performance  of 
Ihe  puts  DAS  algorithm  as  a point  of  refereocc. 

Ibst  problems.  We  tise  a subset  of  the  standard  Netlibtesl  problems  available  in 
Ihe  public  domain  (Gay  |9]}.  We  include  only  the  problems  with  full  rank  constraint 
matrices  and  those  without  a BOUNDS  section  in  their  MPS  representation  as  our 
implementations  do  not  perform  pnsprocessing  and  do  not  handle  bounded  variables 
implicitly.  Since  we  are  testing  robustness  with  respect  to  bad  stnrting  poiots,  we  also 
exclude  problems  which  do  not  have  a full  dimensional  ioterior  to  prevent  possible 
instabilities  that  may  arise  out  of  a big*M  approach  to  solving  these  problems  from 
coloring  our  results. 

Since  the  Netlib  test  problems  are  in  primal  standard  form,  we  solve  the  corre- 
sponding dual  linear  programming  problems.  Primal  optimal  solutions  are  obtained 


usiog  tbc  dual  ratiniales  given  by  (3^).  Table  3.1  prcecnta  atatUlics  for  the  13  teel 
problcmfl  coiuidcred  arranged  acrordirtg  In  tbe  riill  rive  of  Uic  constraint  matrix. 


Tbbte  3.1.  Teat  pi 


Problem  Roars  C nis 

"aSS 5i — W 

ADLitth  138  36 

S3arc3b  152  96 

Scagr?  185  129 

Sbamlb  233  117 

Israel  316  174 

SdK5  317  205 

BandM  472  303 

Scad/  760  77 

ScigrZS  671  471 

Sade  1350  147 

Scsdd  2750  397 

ScUp2  2300  1090 

5ctap3 3340  1180 


.1724807l429e4 

.143IOOOOOOOe'l 


Ceneratma  “bad"  startina  iterates.  Tlie  process  outlined  below  generates  an  it- 
erate which  is  a very  rough  approximation  ot  the  solution  to  a perturbed  problem. 
Note  that  solving  the  original  LP  from  this  iterate  is  similar  to  a situation  encountered 
when  warm-startiDg  linear  programs. 

e Step  1;  Generate  an  Initial  interior  leasible  iterate  i°  using  the  Phase  I method 
presented  in  Adler  et  al.  (Ij. 

• Step  2:  From  the  initial  ilcralci”,  perform  up  to  MAXITER(=  12)  iterations 

of  the  pure  DAS  algorithm  on  the  perturbed  problem 


0, 

c(0. 


The  resulting  tterstc  i®  will  most  likely  be  near  a boundary  defined  by  several  con* 

ot  Section  2,  a pure  DAS  approach  to  solving  the  nri^nal  LP  from  this  starting  point 
will  most  likely  encounter  difficulties  (unless  the  optimal  solutions  to  the  perturbed 

well  the  two  competing  centering  directions  overcome  these  difficulties. 

Implementation  details.  All  eaperiments  were  performed  on  an  IBM  3090.600J 
under  AIX/ESA.  The  algoritlims  were  implemeoled  using  IBM’s  VS  FORTRAN  Ver- 
sion 2 in  double  precision  and  with  compiler  options  OPT(3)  and  VECTOR.  Our 
implementations  store  the  A matrix  in 


matrix  in  packed  storage.  The  large  problems  (Sesdd.  SclipS  and  Sciap3)  were  com. 
piled  using  dynamic  common  blocks  and  executed  with  datatize  and  stoeksire  limits 
of  128M  each.  The  packed  Choltsky  routines  DPPFA  and  DPPSL  from  LINPACK 
were  used  to  compute  the  search  directioDs  but  the  Baaic  Linear  Algebra  Subpro- 
grams (BLAS)  were  called  from  IBM’s  Engineering  and  Scientihe  Subroutine  Library 
(vectoriaed).  The  matrix  A was  not  scnlnl  in  Lite  implementalions  and  no  apparont 
scaling  problems  were  encountered  in  the  tent  runs. 

Results,  observations  and  conclusiona  The  computational  resulfo  are  summa. 
rized  In  Tabic  3.2  which  reports  the  iteration  counts  and  relative  errors  of  the  three 
algorithms.  CPU  limes  are  not  reported  as  each  method  has  the  same  order  of  wewk 


Table  3.2.  Performaiirc  from  “bad"  stariiag  itcnilcfl. 


Pure  DAS 

Problem  IteratioDB  Error 
Affro  27  lc-09 

ADLfdfe  130*  ie-OO* 

51iare2b  6 2e>09 

Scagr?  89  le-09 

Sbarclb  43  2e*09 

Sc205  57  le-09 

BaodAf  39  5e-10 

Sadi  Id  4e-09 

SetapI  36  4e-09 

Scigr-iS  78  4e-0» 

Sadfi  40  2e-D9 

SodS  17  3e-D9 

SclapS  36  3e.09 

Scup3  42  4e-09 

TOTAL  &30 

" Porrrd  tmniuadoD  at  MTF.R  s 130 


DAS  w/  NC 


As  the  theory  of  Chapter  2 predicla,  the  reaults  on  Ihia  small  set  of  standard  test 
problems  indicate  the  superiortty  of  using  the  modilied'NewlDn  centering  direction 
over  the  Newton  centering  direction  ax  a hedge  against  non-robustneas  that  can  be 
caused  by  bad  .starting  iterates.  The  DAS  w/  M-NC  algoritlim  was  able  to  solve  all 
test  problems  to  the  prescribed  optimality  tolerances  while  the  two  other  algorithms 
failed  to  do  so  for  two  teat  problems.  The  M-NC  based  algorithm  waa  better  than 
the  NC  based  algorithm  for  all  the  problems  where  the  pure  DAS  approach  clearly 
had  difEnilty.  Fbr  the  problems  where  the  supposed  “bad"  iterates  clearly  turned  out 
to  be  “good"  iterates  which  were  in  facl  approaching  the  actual  solution  (Sbare2b, 
Scsdl  and  SesdS).  all  the  algorithms  were  competitive. 

The  “bad"  itcrale  generator  for  StareJb  and  flaniJM  produced  iterates  with  actual 
objective  values  diverging  to  -oo.  Thus,  the  starting  pants  were  not  bad  in  the  sense 


of  being  close  to  tbe  bouod&ry  but  b&d  in  the  sense  of  being  very  for  flora  tbc  solution. 
For  these  inslsnces,  centering  steps  were  nniiecessary  and  tbns  tbe  pure  DAS  approach 
had  H sligiil  advantage  over  tiie  centered  methods  as  the  pure  DAS  step  obtained  a 
head-start  over  tbe  centered  steps  taken  by  the  iatter  at  their  initial  iterations. 


CHAPTER i 

UNCONSTRAINED  OPTIMIZATION 


In  thia  cun.  ui  id 


28 


by  nearby  iterates  and  their  respeclive  gradients  are  used  in  finding  the  next  search 

differ  greatly  in  magnitude.  For  these  prohlems,  the  gradients  change  rapidly  and 
the  functiens  behate  almost  as  if  they  were  lumsmuolll  [ficmarechal  |33]).  Hence, 

algorithms  that  may  be  effective  in  dealing  with  ill-conditioning. 

One  class  of  methods  that  is  motivated  by  the  above  goal  is  the  class  of  bundle 
melliods  for  minimising  nonsmootb  functions.  This  chapter  shall  be  concerned  with 
using  the  WLS  techniques  that  have  been  developed  in  the  preceding  two  chapters  in 
coming  up  with  a new  bundle  strategy.  We  shall  develop  an  algorithm  geared  towards 
smooth  ill-conditioned  and  iionsinootli  problems  and  prove  global  convergence  for  tbe 


4.2  A Review  of  Bundle  Methods 

Bundle  methods,  firet  proposed  by  Lemarechal  [29,  30j  and  Wolfe  [58|,  are  based 
on  tbe  idea  of  generating  descent  directions  for  nonsmootb  functions  by  using  in- 
formation  provided  by  nearby  subgradienls.  The  following  Is  a review  of  dements 
(stated  without  proof)  of  convex  analysis  that  form  the  basis  for  bundle  methods. 
This  review  is  based  on  Lemarechal  [331  “'**0  a more  detailed  exposition  and 
proves  most  of  the  propositions.  Althougli  the  following  exposition  can  be  extended 
more  generally  to  nonconvex  locally  Lipschitn  functions  (Kiwiel  (24].  Mifflin  (391),  we 


We  consider  tbe  unconstrained  minimisation  of  a coovex  not  necessarily  differen- 
liable  function  / over  R".  Convexity  implies  that  / is  locsily  Lipschitzian  and  hence 
differentiable  almost  anywiicre.  Thus,  the  snWrfferenfia/ of  / at  i. 


(J{x)  = conv{s  e R"  : s = limV/(s'),i' 


V/(i')  exisls,  V/(i')  converges), 


(4.1) 


is  a well'delincd,  nonempty,  convex,  and  compact  set.  This  reduces  to  thu  gradient 
Tor  the  case  when  / is  differentiabie  at  £.  For  convex  /,  this  set  is  equivaJem  to 

«/(s)  = {9  € R"  : 9^(9  - x)  < Hv)  - /(x),  Vj  6 R')  (4.2) 

The  elcnients  of  6J(x)  are  called  sa69rod(CTits  of  / at  i and  will  be  denoted  by  /’(s) 
or  when  there  is  no  ambiguity,  simply  by  9.  We  make  the  standard  assumption  that 
/(z)  can  be  defined  by  an  oracle  (subroutine)  which  given  for  any  x'  in  R"  of  /, 
the  function  value  /(ff*)  and  an  arbitrary  subgradicot  /'(z*)  and  hence  n supporting 
hyperplane  r = /(z')+  (/-(x'l.fi  - *■)). 

Bundle  methods  seek  to  force  a decrease  in  / at  each  iteration  from  to  x*^'. 

kiok  (point  of  nondiferentiabilily),  then  an  arbitrary  subgradient  /'(z^)  may  not  be 
a descent  direelion.  let  jj  be  the  vector  in  4/(z*)  of  minimum  norm.  This  vector  is 

Then,  it  ran  be  shown  that  d = —gi,/  ||  94  ||  is  a descent  direction  for  / at  z^.  ft  is 
in  fact  tbe  equivalent  of  the  steepest  descent  direction  as  it  can  be  shown  to  be  the 
solution  to  tbe  direction  finding  subproblem 

igmd'lz'td) 

where  d'(z;d)  is  lire  directional  derivative 

d'(z;d)  = to^i|/(z  + i<fl-/(z)l  (4.3) 

= .S/S)®’’'' 

which  exists  in  every  directioo  d. 

Because  of  nonsmoothness,  a descent  algorithm  based  on  this  steepest  descent  di- 
rection can  still  lead  to  convergence  to  a non-optimaJ  point.  False  convergence  can  be 
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avoided  by  iucluding  aubgradicot  informalioa  from  nearby  poinU  in  tbe  compulation 
of  the  descent  direction.  This  can  be  done  by  replacing  the  aubdilfcrenliaJ  vdth  the 

6J{x)  = {j  € R“  : ,-^iv  - x)  < m - fix)  4 i,V»  e R'). 

Note  that  if  we  set  c e 0,  we  get  tile  subdifferential  6f{x).  The  ehiOtenta  of  d(/(x) 
are  caJlcd  c-au6yrodient&  It  can  be  shown  that  there  exists  a neighborhood  H of  r 
such  that 


The  c-dimctional  derivative 


d;(x;d)  = int-|/(x4ld)-/{ij4.) 


exists  in  every  direction  d and  provides  a measure  of  descent  along  r*  4 Id.  This  leads 
to  the  direction  finding  subproblem 


which  is  cc|uivalenl  to 


If solves  (4.9),  Ihcnd*®  -j*/  ||y*  ||  solves  (4.8).  IfO  ^ «,/(**),  then  d'Js*;  d)  < 0 
from  (4.7),  and  (4.6)  implies  that  llie  directiou  d^  is  a direction  that  decreases  / by 


0 t =b  /(a*  4 Id*)  < /(i*)  - 1 for  some  I > 0. 


(4.10) 


3: 

Conversely,  if  3 € i,/(e*),  Ibeo  /(j*)  </(z)  + e for  alt  x,  i.e.,  z*  is  C'OpliTDal.  Thus, 
asauming  the  full  r-subdifTereutial  is  available,  a couceplual  descent  algorithm  based 
on  the  solution  to  (4.D]  can  be  developed  to  find  an  r-optimal  solution. 

fn  practice,  however,  only  one  subgradient  is  available  at  a point  z*.  Instead  of 
using  the  full  c-anbdiirerenlial,  bundli'-type  algorithms  replace  4, /(z*)  by  some  inner, 
approximating  polytopr  P (the  bundle)  and  solve  (4.9)  with  S^f(x^)  replaced  by  P. 
One  such  P can  be  developed  in  the  following  manner. 

Let  y,  = fly')  € 3/(p'),i  = I A be  a collection  of  subgradients  already  com- 
puted from  previous  Iterates  y',t  = 1,. . .,i.  where  z*  = yt,  for  some  j € {I it}, 

Ut 

/>.=/(i*)-/(y')-9.V-P')  (4.11) 

be  the  bnearlzatjon  error  of  / at  z^  with  respect  to  tile  supporting  hyperplane. 
Then 

{y -P  = E».s.,A,  >o,i:a.  = 1,  j:a,p,  < t}  (4.12) 

can  be  shown  to  be  contained  in  i,/(z‘).  The  direction  finding  suhproblem  (4.B)  with 
replaced  by  P llieu  reduces  to  the  quadratic  programming  problem 

d=  -argraindlj  ||’:p6  /»)  (4.13) 

If  ^ is  a good  approximation  of  dt/(z*j,  then  the  resulting  direction  yields  an 
acceptable  decrease.  Otherwise,  a new  snbgradienl  y*  at  jl**'  = z*  + Id  for  some 
snial)  I > 0 is  obtained  which  by  (4.5)  is  an  t-subgradienl  at  z*.  U can  be  shown  that 
P*  = eono{PU(9*)) 

is  a better  approximation  of  d,/(z*)  than  P.  A null-step  is  then  performed  by  solving 
(4.13)  using  fV  and  starting  the  line  search  from  the  same  z‘.  In  sum.  we  have  the 
algorithm: 


1}  Given  Ibe  cutteni  iterate  a*  and  the  triai  points  y\i  ~ 1 k yiciding  the 

9ubj;radient»  g,  = Let  p,,i  = be  the  UnearUation  errors  as 

defincKi  above.  Let  t > D be  the  control  parameter. 

2)  Compute  the  direction  d^t  solve  the  quadratic  program 

mbill  d II’  (4.U) 

«.t.  d = 

A,  > 0 

£A..  = 1 

£A,„  < r 


3)  Clirch  for  stopping:  if  ||  d*  ||  is  small  then  slop  [or  reduce  r and  go  to  Step  2], 
1|  Do  the  line  search:  let  = i*+ Id*  lor  some  I > 0 such  that  j**'  € d/(p**') 
has  p**'’"d*  “large  enough",  yielding 

either  a serious  step;  “small  enough"  yielding  a decrease  of  “almcet 

or  a null  slep:  assign  p**’  the  wcighl  psti  - /(i‘)  - /(y**')  + (j**'’’d*  < t; 

5)  Update  the  weights  Pi.i  = l„,.,t  + 1;  replace  h by  »;4  1 and  loop  to  Slep  2. 

Details  of  the  line  search  (quantifying  “large  enough',  “small  enough’  and  “airaost 
e")can  be  found  In  Lemarerhal  [33],  It  can  be  shown  that  if  infinitely  many  null  steps 
are  performed,  then  the  solulioo  of  (4.14)  lends  lo  sero.  Thus  the  slopping  and  r- 
reduclion  criterion  in  Slep  3 is  valid.  In  this  framework,  a “serious"  slep  corresponds 
10  a descent  step  while  a “null"  step  corresponds  to  an  enhancement  of  the  current 


bundle  with  the  eiibgradieni  informetion  at  when  ilic  objective  funclton  doee 

The  abcpve  Algorithm  has  been  impicmeDtcd  in  the  FORTRAN  code  MlPCl  by 

robust  as  it  is  sensitive  to  the  choice  of  the  i parameter  (Schramm  and  Zowe  [43]). 
Furthermore,  as  c — • 0.  the  algorithm  reduces  to  the  steepest  descent  method  if  / 
is  smooth.  Modem  bundle  methods  (e.g.,  Kiwiel  [25]  and  Schramm  aod  Zowe  [43]) 
attempt  to  overcome  these  difficulties.  We  shall  take  a different  approach  using  the 
weighted  least  squares  techniques  we  have  dctrloped. 

4.3  A Bundle  Method  based  on  Weinhted  Least  Squares 
Note  that  in  the  above  model,  the  parameter  e is  used  both  in  the  direction  finding 
subprobicm  and  in  the  test  for  the  null  step  criterion.  This  causes  a difficulty  which 
can  be  illustrated  by  going  bark  to  Figure  4.1.  Suppose  that  a**'  with  subgradienl 
fft_i  is  the  iterate  closest  to  the  current  iterate  z*.  i.e.,  its  lineariration  error  ps_i  is 
Ike  smallest  among  the  p(s  with  Ihe  exception  of  pk  which  is  of  course  aero.  If  c is 
chosen  such  Ihst  it  is  close  to  or  even  greater  than  pr-i.  It  is  possible  that  would 
already  be  r-optimal  causing  several  null  step  iterations  before  a direction  dof  small 
enough  norm  is  produced  to  give  a conclusion  of  c.optlmality.  In  this  inataoce,  < may 
not  be  small  enough  for  acceptable  eptimality  necessitating  a reduction  in  the  value 
of !.  On  the  other  hand,  it  c is  chosen  so  that  e « ps.,  then  the  subgradient  yi-, 
is  not  an  c.subgradient  and  will  have  little  eifect  on  the  diroctlon  finding  subproblem 
(4.14).  The  resulting  solution  vector  will  be  dominated  by  the  steepest  dmrent  vector 
-Pi  which,  as  constructed,  is  a very  had  direction.  Worse,  as  r — 0,  the  resulting 
directioo  reduces  exactly  to  the  steepest  descent  direction. 

In  the  above  bundle  formulation,  descent  is  forced  only  with  respect  to  c- 
snbgradicnts  that  have  been  generalol,  i.e.,  those  with  p,  < t.  After  a serious  step 
is  made,  a descent  of  at  least  t is  made  thits  making  all  subgradienta  but  the  current 
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one  non-t-subgrAdieols.  Thus,  llie  dinretion  fuiding  nihproblcni  may  doI  give  descent 
with  respect  to  the  nearest  subgradiont  even  if  such  a properly  is  advantageous  as  in 

Tb  overcome  this  difficulty,  we  shall  devise  a direction  finding  aubproblem  that 
generates  descent  with  the  nearest  subgradients  even  if  they  are  not  of  the  c type. 


vrbere  p„gi  = /'(y').  are  as  in  the  preceding  section  and  e,  > 0 are  some  positive 
paramulers  (compare  with  3J),  Then  the  solution  d to  (4,15)  would  tend  to  satisfy 
sjd  close  to  —1  for  Vf  and  p,  small  enough.  Without  Ices  of  generality,  assume  that 
the  current  iterate  z*  = y*  and  hence  pt  — 0.  Our  first  priority  is.  of  course,  to  force 
dfsceot  with  respect  to  js,  i,e,,  to  force  yfif  < 0-  From  Corollary  I of  Chapter  1,  this 
can  be  done  by  setting  as  = 0 thus  making  the  weight  on  the  bth  equation  of  (4,15) 
infinite  and  forcing 

,?rd  = -l. 

Second,  we  need  to  have  descent  with  respect  to  the  nearest  subgradieots,  le.,  these 
with  smallest  p,'s.  To  do  this,  we  can  set 


o,  w n,Vi  ,i  t 

for  some  i/  > 0.  This  leads  to  the  direction  finding  subproblem 

^;^ljrda-ll,i  = l fi-1  (4.16) 

s.t.  sjd  = -l 


where  e is  a giveo  small  positive  paramelur  (we  use  e = 10"“  in  our  inipIcmKntalion). 

This  direction  finding  subproblera  tends  to  make  d have  a negative  inner  product 
with  subgradieots  that  are  generated  nesu-  the  iterate  z*,  i.e,  with  subgradiente  for 


or  is  actuaUy  a point  of  nondifTeKntiabili  ty,  the  WLS  direction  d provides  a mechanism 


for  taking  into  consideration  nearby  subgradients,  in  particular  C'subgradicnts  with 

For  iotermediate  size  p,'s  this  deviation  may  stiti  be  small  enough  to  have  a negative 

the  subproblcm  (4.16)  is  independent  of  c and  depends  only  on  p,  and  to  a small  extent 
on  V.  Hence  in  a bnndie  algorithm  incorporating  this  direction  finding  subproblettl,  c 
would  come  into  play  only  in  the  test  fora  null  step.  Before  we  can  ^vetbe  mechanics 
of  such  an  algorithm,  we  first  turn  to  the  problem  of  solving  this  subproblcm  and 
perhaps  refining  it  further. 


Consider  the  direction  finding  stibprohlem  (4.16).  Let  x‘,f  = l,...,h  be  the 
previous  iterates/trinl  points  with  the  current  iterate  with  the  lowest  func- 
tion veluc.  Define  the  matrix  A € R**'*"  to  be  the  matrix  wbcee  ith  row  is 

ffl  = = 1 k - i.  Without  loss  of  generality,  suppose  that  b > n 

and  that  A is  of  full  coiumn  rank  n (the  case  where  b < n is  an  interesting  case 
that  relates  the  method  to  conjugate  gradients  and  this  will  be  discussed  io  a later 

subsection).  Define  sr  = p,  + e,Vi  = 1,, . .,i  - 1.  Let  S =diag(si ej-i).  Tlien 

(4.16)  can  be  written  as 


Min  I j:S(s.-‘(srTd+  1))’  = 1 ||  S-‘(ytd-h  e)  ||= 


(4.17) 


sfds-l. 


The  solution  to  this  subproblem  is  characterized  by  the  rallovriog  the 
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(4-19) 

(4.20) 

(4-21) 


je(x‘;;-)  = {,:.4r<6) 

where  t,  = ji-i*  + p,  + p.  Expending  p.  from  (4.11),  we  see  ihel 

,<:(s‘;e)=  (i :/(!') i_l}.  (4.22) 

Consider  now  the  level  set  of  the  fvnclion  /(z)  et  the  level  /(z‘)  + e: 

X(z‘:P)  = (z:/(z)</(z*)-t-v}. 


..  (4-1)  we  gel /(I)  </(z')- 


X(e‘;e)C.«-(z‘;n), 
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x’,i  s 1 t - 1.  Hxncc,  this  direction  combince  alHae  scAJiog  and  modified  New* 

ton  centering  on  level  set  approximations  of  thefuocLioa  /(z)  in  which  the  supporting 
hyperplaues  at  z'.i  < h serve  as  constraints  and  -jt  serves  as  the  LP  objective,  in- 
tuitively, the  afiine  scaling  direction  ds  induces  descent  with  respect  to  ps  while  the 
centering  rlirection  dsr  induces  descent  with  respect  to  nearby  subgradients. 

Note  that  only  the  iterates  with  small  p,  affect  significantly  in  the  direction 
finding  subproblem.  Hence  it  is  possible  to  select  only  a subset  (e.g.,  those  with 
promising,  i.e.,  small  p,)  of  the  subgradients  p„i  < h to  be  included  in  the  rows  of 
A.  We  can  thus  impose  an  upperbound  MM  AX  on  the  number  of  rows  of  A,  i.e.,  the 
number  of  elements  of  the  bundle. 

As  a linai  refinement,  we  will  require  In  the  convergence  proof  lor  the  smooth  case 
discussed  in  a later  subsection  that  the  magnitude  of  the  direction  d*  be  bounded. 
This  can  be  induced  by  a trust  region  constraint 


for  some  p > 0 in  the  objective  function  of  (4.17).  We  can  now  state  formally  the 
refined  direction  finding  subproblem  that  we  shall  use  in  our  algorithm. 


Let  MMAX.p.w  > 0 be  given  positive  parameters.  Let  z*  be  the  current  iterate 
and  let  p‘,s  s 1,. , be  the  trial  points  generated  so  far  with  z^  s jp  for  some 
j.  let  s,  = ft  -b  V where  p,  is  the  itb  linearization  error  as  defined  in  (4.11).  Let 
A be  the  matrix  whose  rowa  are  formed  by  the  transposes  of  a collection  of  up  to 
MSiAX  subgradienli  ^ z‘.  In  practice,  we  can  chose  to  include  only  the 
newtsl  trial  poinUoi  those  with  smallest  ft.  Let  S be  the  diagonal  matrix  of  the  s.'s 


Mil’s  A' 


or  equivalently  by  adding  the  regularizing  lerm 
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corresponding  to  Ihe  g,’s  included  in  A.  Then  Ihe  ncxl  search  direction  d*  shaL  be 
chosen  as  the  solution  lo 

{DFS)  MIN  l;.||d||»+l||S-'{/ld  + e)||’  (4.23) 

S.I,  /’{o')’'d=-l.  (4.24) 

rbeorem  fi  The  solotion  d*  to  (DFS)  is  given  by 

d*  = dv  + rds  (4-25) 


d.r  = -((.i4r4’'S-M)-’yl’’S-’e 

dn  = -(g/  + r4’-5-»4)-‘/V) 

/'(x»)’~dv  4- 1 


(4.26) 

(4.27) 

(4.28) 


Proof.  Follow.s  directly  [com  the  KKT  conditions  for  the  problem  (DFS).  0 
4J.2_  A hybrid  line  search  and  cenlriJ  cultins  plane  procedure 

After  llie  search  direction  is  computed,  a step  along  the  direction  must  be  made  to 
And  the  next  trial  point.  We  shall  basically  use  the  inexact  line  search  procedure  often 
used  in  smooth  methods  along  with  someenhancementa  that  explmt  the  structure  of 

Most  line  search  procedures  for  minimizing  smooth  functions  are  based  on  hnding 
a stepsise  Js  satisfying  Ihe  Wolfe  [56,  571  conditions; 


/(x‘  + Asd*)  < /{l‘)-l-o,Jj/'(z*]V  (4.2S) 

/'(i‘  + Ard*)V  > ff,/'(x‘)V.  (4.30) 


with  0 < £T|  < oj  < I and  where  d'  is  a descent  direction  from  Before  adapting 
this  test  10  nonsmooth  functions,  one  must  note  that 
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1.  If  is  A kink,  then  the  inner  product  /'(x^)^cf'‘  is  only  aq  uDdcreslimAle  of  the 
directloDAlderivAtivo  (4.4)  as  it  only  includes  inlomiAtion  from  one  subgradient, 

2.  FVirthermore.  it  is  possible  that  tf^  is  not  A rlescent  direction  and  hence,  the 

The  lirel  concern  does  not  usually  present  a problem  but  the  second  one  most  def- 
initely does.  It  can  be  handled  by  the  use  of  null  steps:  if  the  line  search  indicates 
that  the  direction  would  not  jive  sulBcieut  decrease,  then  the  iterate  stays  where  it 
is  but  a point  along  the  direction  is  generated  to  enhance  Ibe  bundle  inforroation 
(see  c.g.,  [31]).  Because  of  the  limitations  of  the  traditional  steepest  descent  bundle 
direction,  one  must  generate  a oeaiby  point  that  would  give  another  e.subgradicnt  as 

lem  significaotly.  Hence,  a line  search  is  always  necessary  to  fiiid  acceptable  points 
for  both  serious  (descent)  steps  and  null  steps. 

Modem  bundle  methods  (Kiwiel  |39|,  Schramm  and  Zowe  [43|)  use  direction  find- 
ing subproblems  which  can  use  non-c  subgradients  to  update  the  buodle.  The  advan- 

Thsse  new  algorithms  are  in  fact  closer  to  the  spirit  of  trust  region  methods.  As  our 
direction  finding  subproblem  can  also  take  advantage  of  non-c  subgradients,  we  shall 
develop  our  algorithm  along  similar  lines. 

Kecall  that  our  direction  is  a modified  Newton  ceuteriug  direction  on  Ibe  sel 

w)  = {*  : f{x<)  - sTix'  - X)  < /(x‘|  -h  e,  f = 1 t - 1 ).  (4.31 ) 

If  we  add  to  -V(x^;  v)  the  hyperplane  at  x^*  we  get  the  set 

V'‘=.¥(x‘;n)U(x:-j.V-ar)<0). 

This  polyhedral  set  contains  the  level  set 


*.'*=U:/(a:)<y{x*) 


40 

llie  (lireclioii  id  the  limit  of  the  ntodifieO  Newton  ceateriag  tliiection  oa  as  7 
approaches  the  boundary  point  x^of  Hence,  we  can  use  this  direction  io  a central 
cutting  plane  framework  in  the  htilowing  manner.  We  use  the  standard  ratio  lest  to 
get  the  boundary  point,  say  te^  across  from  along  the  direction  d*  (we  assume  Ibr 
the  moment  that  K*  is  already  compact).  Since  the  centering  direction  tends  to 
move  away  from  the  nearby  constraints  of  we  can  simply  get  tbe  next  trial  point 
at  the  halfway  mark  from  x‘  to  lu*,  i.c.,  we  let  y**'  = x*  + Os  ■ (te‘  - x*),ot  — .5. 
We  call  the  oracle  at  obtain  the  function  value  and  snbgradient  information 
resulting  in  a central  cutting  plane  at 

We  now  decide  whether  to  move  Co  for  the  next  iteration  (serious  step)  or 
to  stay  at  x‘  aud  make  a null  step.  For  this  we  use  the  Wolfe  conditions  (4.29)  and 
(4,30).  If  satishes  these  conditions,  then  we  have  made  an  acceptable  decrease 
for  a serious  step;  x**'  = y**'.  In  this  case,  the  next  polytope  would  be 
signihcsnlly  smaller  than  as  well. 

Otherwise,  y**'  violates  one  or  both  of  the  Wolfe  conditions.  If  the  right-side 
condition  (4.20)  was  violated,  then  wc  make  a null  step:  - x**.  To  ensure 
convergence  in  the  smooth  case,  we  also  need  to  decrease  ihe  backtrack  parameter 
Ok+i  to  say  y * Qt  for  some  (I  < 7 < 1,  This  is  in  the  spirit  of  backtracking  line 
searches  or  deernasiog  trust  re^n  sises. 

If  the  right-side  condition  was  salished  hut  the  left-side  condition  (4.30)  wss  not, 
then  wc  know  that  /(«**']  is  smaller  than  /(i‘)  but  y**'  is  “loo  close'  to  j*.  In 
this  case,  wc  line  search  (“forward  track')  from  to  gel  a point  that  would  satisfy 
both  Wolfe  conditions.  This  means  that  we  shall  try  to  force  a serious  step  to  an 
I**'  I*.  In  the  smooth  case,  we  can  guarantee  that  wc  shall  find  such  a point. 
(We  cannot  yet  guarantee  if  such  will  be  the  case  when  / Is  nonsmoolh.) 


We  C4D  theo  sel  k + \ and  repeal  iLe  iteration  taking  care  u>  reset  at  back  to 
.5  after  cacL  serious  step.  So  far  we  have  assumed  that  is  bounded.  This  will  not 
be  the  case  in  the  early  itemtions.  W'hat  we  do  in  this  case  is  to  impose  a standard 
step  restriction  |[  s - ||m<  As,  for  some  An«s  > As  > 0 where  A„mt  is  some 

large  number.  We  can  adjust  As  say  by  Asst  — dAs.d  > 1 after  each  serious  step 
to  make  sure  it  does  not  get  overly  restrictive.  Before  wepve  a formal  description  of 
the  algorithm  we  first  turn  to  the  doal  requirement:  the  stopping  crilerioo. 

goritbm  for  the  case  when  / is  smooth  in  which  the  stopping  criterion  |[  gk  ll<  U 
would  suffice.  This  is  however,  not  applicable  for  tbe  oonsmootb  case.  One  possible 
stopping  criterion  Is  to  use  the  affine  scaling  stopping  criterion  used  in  Chapter  3 for 
the  dual  variables  associated  with  the  affine  scaling  component  da  of  the  direction 
d^.  The  theoretical  underpinnings  of  such  a stopping  criterion  for  this  problem  still 
needs  to  be  investignted.  For  our  implementation  we  simply  use  the  stopping  test 


for  a ^ven  tolerance  parameter  r.. 


f.3.3  The  affine  sealine  wi 


Civen  the  parameters  MMAX,  (i,i<,A„„,7,/J,Oi,  and  a,  aa  defined  In  the  pre- 
cedjng  discussions.  Let  y'  be  the  starting  trial  pdnl  with  subgradienl  /(y')  and 
function  value  /(y'j.  Let  i'  = y'.  Initialize  A,  = 1,  os  = .5.  Set  fc  = 1.  From 
itrcaliOD  k,  tbe  algorithm  proceeds  as  follows. 


1)  Compute  d*  from  the  direction  finding  subprobleni  (UF5). 


2]  Using  the  standard  ratio  teat,  compute  the  maximum  sleprize  A„„  froro 
along  the  direction  d*  on  the  polyhedral  set 

A-*  = .V(i‘:  i/)  U (i:||  x-x‘  ||_<  A,). 


; the  targesl  A > 0 for  which  j*  + Arf*  € X*. 


3)  Set  Ai  =oi,.A„. 

4)  SMs*t‘  = l*  + A,rf*. 

6)  (Rifffit-fiidc  test.)  tf  the  right*side  Wolfe  cDitditian  (4.29)  is  NOT  satisfied  then 
goto  8); 

else  continue  to  7). 

7)  (Tr/t'Side  test.)  If  the  left-side  Wolfe  condition  (4.30)  is  satisfied  then  goto  9); 
else  goto  10) 

8)  (iVritf  ticp)  Sets**'  = i*.  Sot  04*1  =704  (backtrack).  Go  to  11). 

9)  (5erieus  step  o)  Set  z**'  = y***.  Set  04+1  = .5  aod  set  i4*j  = miii{^« 
tAs.Anu}.  Go  to  11). 

10)  (Serious  step  h)  Line  search  from  along  the  direction  d‘  for  an  i*t-i  _ 
+ Ad*  satisfying  the  left-sidc  Wolfe  condition  (4.30.  Change  g**'  into  the 
z***  found.  Set  04.^1  = .5  aod  set  A4+1  = min{5s  Go  to  11). 


Li Global  Convcrsence  for  .Smooth  Funrlion. 

prove  convergence  of  the  algorithm  for  the  smooth  case,  let  us  first  slate 


I.  The  level  sel  Xo  ;=  {z  ; /(i)  < is  hounded. 


Upscbilx  continuous  jo  somtf  open  noighboctiood  ol  Xq.  Heiicc,  Lhen  exists  s 
r > 0 5i;ci]  that 

II  ^9ll<r,  VreXo.  (4.32) 

Note  that  we  do  Dot  aasuioe  coDvexity  here.  We  ahall  first  prove  convergence  of 
a line  search  algorithm  (without  null  steps)  that  would  apply  generally  to  functions 
satisfying  the  above  assumptions.  We  shall  then  prove  that  the  aiSne  scaling  with 
ctnitering  bundle  method  above  is  globally  convergent  for  smooth  convex  functions. 
4.1.1  Convercenceof  a line  search  variant  for  seneral  smooth  lunctlons 

Given  the  parameters  0 < p.MXtAX  < oo,i-  > D.  At  iteration  fc,  let  (j‘]  be 
the  set  of  generated  iterates,  x*  the  current  iterate  satisfying  f{x*‘)  </(»'),  Vi,  and 
Si  — Vi-  Define  s,  as  i < It  to  be 

a,  = /a  + e,  if  / is  convex.  (4.33) 

where  />,  is  the  linearization  error  defined  in  (4.11).  Note  that 

s,>e>0,  Vi.  (4.34) 

let  J‘  Q I (r-1  be  an  index  set  with  Ms  = |J‘|  < MM  AX.  This  set 

Is  the  collection  of  indices  that  rnny  be  considered  relevant,  e.g,.  those  generated 
near  Without  loss  of  generality  and  for  ease  of  notation  suppose  that  = 

1.2 Ms.  Define  A as  the  matrix  whose  ;lh  row  is  gj,  j £ J*.  Similarly,  define 

S wdiagfs,.,. , Let  e be  the  vector  of  ones  of  the  appropriate  dimension. 

A vector  d‘  is  a descent  direction  for  f at  i‘  if  pfd  < fl.  The  angle  «,  between 
—gs  and  tf*  is  obtained  from 


The  dlreclion  lhat  we  shall  iise  is  the  aflinc  scaling  with  centering  bundle  direction 
obtained  from  the  solution  of  iPFS)^ 

Wo  shall  prove  that  an  inexact  line  search  algorithm  hasetl  on  the  above  direction 

,‘*'  = i*  + Ak/*  14.M) 

where  rf*  is  obtainerl  using  {DFS]  and  where  thir  steplerigth  As  satislice  the  Wolfe 

conditions  (4^)  and  (4.30)  where  now  represents  the  gradient  at  x‘. 

Before  we  stale  our  global  convergence  theorem  we  slate  williont  proof  three 
lemmas  that  wc  shall  need  later.  The  proofs  of  the  lemmas  can  be  obtained  from  the 
cited  literature. 

f.cmma  / Suppose  that  dasnmplions  1 hold,  and  consider  any  rteratiop  of  the  form 
(4.35),  where  d*  Is  a descent  direction  and  As  satisfies  the  tVolfe  conditions  (4.29)- 
(4.30).  Then 

X;cos'0.||9.||><oo  (4.37) 

Proof.  See  Wolfe  [36,  571-  See  also  Zontendijk  [60].  Condition  (4.37)  shall  he  referred 
to  as  the  Zoutendijk  condition.  C 

l.etoins  2 flVilitinsoit  [55).  pp.  W-97)  .S’upposc  hi  = 4+ roc^  wh«re,4  is  a s/mmetric 
nan  matrisr,  c is  a vector  of  unit  norm,  and  r a scalar-  Denote  the  ith  eigenvalue  of 
A hyui,(^),i  = 1,. . ,,n  arranged  in  increasing  order  of  magnitude-  Then  there  exist 
normegat/ie  mi,. . ..ni,,  such  that 

w.,(B)=w,(4)+m,r.  t = l n 


hern  mad  if  d*  is  a descent  direction  for  / at  and  / is  hounded  heiow  aJrsng  d* 
then  if  0 < ffi  < ffs  < 1 then  there  exists  air  interval  of  acceptable  points  satisfyiug 
holh  IVoife  conditions  {4.2S)and  14.30}  along  the  rayx*  + Ad*.  A > 0. 


1.  Ut  (i‘)  be  . 


II  ll<  T 


for  wme  conslool  T.  i.e.,  Ibst  ||  ^ ||  is  bounded  above.  This  would  coniradict  (4.44) 
thus  proving  (4.3S). 

We  now  show  that  ||  ||  is  hounded  above.  For  ease  of  notation,  let  d = d^. 

Recall  that  J = du  + rda  (Thin.  S).  It  is  then  enough  to  show  that  ||  d»  1|,  |r|,  and 
II  da  II  are  bounded. 

Consider 

« = (./  + 

where  ^ and  5 are  defined  as  in  the  preliminaries.  Then  H is  symmetric  positive 
definite.  Recall  from  (4.32)  and  (4.43)  that  ||  g,  [|  is  bounded  below  and  above. 
Fiirthermore,  t,  is  bounded  bdow  for  all  ; (4.34).  It  then  follows  that  W can  be 


H = m!  + ^K‘9.sJ 

= p/+2v,c,ef 

where  r,  = ||  g,  ||^/sf  arc  pcsitive  scalars  bounded  below  and  above  and  c,  =fi/  ||  || 

are  vectors  of  unit  uorni  for  all  i — From  repeated  application  of  the 

Wilkinson  Lemma,  it  follows  that  the  dgeuvalues  of  H arc  hounded  below  and  above. 
From  the  boundedness  assumptions  on  Ij,  ||,  ands,  for  ally  6 J‘.  it  follows  lliat 
A = -^^5-«e=-|s,-’p, 

has  bounded  norm,  i.e.,  ||  A ||< /Vs  for  some  iVa  > 0.  Let  o„  o.  be  the  smallest  and 
largest  dgenvalues  respectively  of  Since  the  dgenvalues  of  H (and  therefore 
W"'l  are  bounded  then  there  esist  ff,a  > 0 .such  that  ff  < Oi  < o,  < o.  Then 

II  d„  11=11  «-'A  ||<o.||A||<mWs. 

Also 


II  11=11  ||<  o,  II  P.  ||<  J II  pa  I 


i.i  _ 

iij.n> 

which  is  ifaen  bounded  since  ffi,  ||  ||  and  ||  du  ||  am  bounded.  Hence  ||  d*  ||  is 

bounded  and  the  theorem  follows  by  contradiction.  □ 


Theorem  S Global  convergence  theorem  2.  fel  /(z)  be  a bounded,  differentiaMe, 
convex  function  owf  R".  I«l  {i‘)  be  the  seijueoce  generated  by  Ibea/Bne  scaling 
with  centering  bundle  mclhod  (ASCBM).  Lei  gt  be  the  gradient  of  / at  i*.  Then 

lirajnf^  II  g,  ||=0.  (4.45| 

Proof.  Again  we  start  with  the  assumption  that  (4.4b)  is  not  true.  It  is  enough  to 
show  that  the  algorithm  takrs  inltnitely  many  serious  steps.  In  this  case,  these^itence 
has  an  inAnite  subsequence  of  distinct  points  {trij  where  (with  appropriate  subse- 
quence renumbering)  z'*‘  = z'  + ijd'  where  A,  satlshes  both  Wolfe  conditions  (4.29) 
and  (4.30).  We  would  then  get  the  desired  result  by  applying  Theorem  7.  Suppose 
then  that  the  number  of  serious  steps  ibat  the  algorithm  lakes  is  finite.  This  means 
that  there  exists  K such  that  for  all  t > K,  the  algorithm  only  takes  null  steps.  This 
implies  that  the  right-side  Wolfe  condition  (4.29)  is  not  satisfied  for  i > A'; 

/(a‘-t- Aid*)  >/(z*)-ho,AijJ’d*,Vt>  A', 
which  by  the  definition  of  d*  simplifies  to 

/(z*4  Atd*)  >/(z*)  -<riAs,Vi>  A'.  (4.46) 

The  aigorilllm  defines  Aj  = o»A„„  where  A,„  is  the  largest  A > 0 for  which  z‘  + 
5 {»  'll  z - z^  ||e*<  Aj).  Since  the  algorithm  is  taking  only  null  steps, 
then  x*,gt  and  At  are  fixed  at  say  z.g  and  A respectively.  It  is  easy  to  see  that  A„„ 


c&ch  null  litep.  Th«]  as  ^ oo,  A*  approachas  0 trom  tba  right.  Dividing  (4.46]  both 

.JiS*  + •'“^)  - /(*))  > -"I-  (4.47) 

For  fixed  (f*.  the  left  band  side  o{  Ihe  abcpve  inequality  can  be  seeu  as  the  directional 
derivative  equal  to  thus  implying 


contradicting  j’’d*  = -i  for  all  k.  Hence,  the  right-side  Wolfe  condition  (4.29)  must 
be  satisfied  after  a Roile  number  of  steps.  The  algorithm  will  then  be  able  to  take  a 
serious  step  AS  guaranteed  by  Lemma  3.  0 


Consider  the  Rrst  few  steps,  say  with  b < n of  the  affine  scaling  with  centering 
bundle  method  wheceio  we  use  p = 0 in  {DFS).  The  direction  findiiig  suhproblem  is 


Mi"  5E.'=-,’I»r’lpfd  + l)l’  (4.48) 

s.t.  jjd=-l. 

If  the  vectors  j,.i  < b are  linearly  independent,  then  the  system  of  linear  equations 

S,^d  = -l,i<b 

is  consistent.  It  has  a unique  solution  if  b = n and  an  infinity  of  solutions  if  b < n. 
In  any  case,  it  is  easy  to  see  that  such  solutions  will  solve  (4.48).  This  leads  to  the 
following  lemma: 

LsnUDtA  Ul  k<n  and  suppo.se  that  g„i  < b are  liaauly  indrpendNit.  Then  any 
sofulion  d*  to  (4.4S)  salisffes 
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tioofl  for  quadrotic  / willi  He»riaii  //; 

te^HiP  = -x^)/X, 

= 0,Vj<i 

where  Ihe  Wt  equaJity  follows  from  (4.49).  This  leads  to  to  the  following  corollary 
establishing  the  link  with  conjugate  gradients. 

Corollafv  2 Suppose  / is  a quadratic  funelion  with  Heeiiui  rualrix  H.  Slartiog  with 
an  iterate  with  gradient  po  and  a direction  ^ SAliafying  w — 1.  an  aigoriliini 
using  the  direction  finding  anbproblem  (d.dS)  and  exact  fine  searches  is  a conjugate 
gradient  method  and  ivili  terminate  in  at  most  n steps. 

Proof.  Follows  from  the  conjugacy  of  the  resulting  directions  and  the  well  known 
result  on  conjugate  grailient  methods.  Q 

It  reniaitiB  to  decide  how  to  solve  (4.48)  given  that  it  has  an  infinite  number 
of  solutions  for  h < rt.  One  can  find  its  unique  minimum  norm  solution  say  from 
an  undenlelrnnioed  QR  method.  In  fact  the  weights  s,  do  not  niatler  as  long  as 
h < n and  this  solution  is  csiuivalent  to  the  direction  found  using  lioutendijk's  [59] 

Min(d^d;j?'d  = -l,i<h}. 

The  interesting  case  hspprus  at  the  point  when  i-  starts  getting  higger  Ibao  ii.  It 
is  here  where  the  weights  s,-  will  come  in  hantly.  As  before,  we  would  want  negative 
inner  products  with  the  p,’s  with  small  s,-’s,  if  possible  maiutaining  the  negative 
one  inner  product  value.  If  ciactly  m < n conjugate  relationships  (negative  one 
inner  products  of  the  direction  with  the  gradicnls)  arc  desired,  then  the  valuM  of  si 
can  be  used  to  decide  which  of  the  gradients  will  maintain  conjugacy  (in  contrast 


to  Zoutcodijk's  suggestion  of  simply  keeping  the  latest  ones).  Zoutendijk  further 
Buggrsted  replaciog  the  relationship  gjd  = — l.i  < k by  -1  — < p?"d  < — I + t, 

(4.48)  wlh  k>  n eomes  very  close  to  realising  this  suggestion. 

4.5  ImoleiuenUtion  Asoocti 

4.5.1  Facloricalions 


aed  siibgradient  evaluation  is 


are  fixed  and  only  an  O(n^)  rank*one  update  is  needed  to  factor  io  tile  nevr  cut.  Each 

reformed  with  O(msn^)  flops  whore  ms  is  the  number  of  cuts  ineludod  in  the  direction 
fluding  nuliproblem.  In  addition,  a compleleO{n^]  refactorization  is  needed.  If  ms  is 
large  compared  to  n,  then  the  formation  of  the  matrix  dominates  each  serious  step. 
Note  however  that  the  algorithm  docs  not  require  ms  to  increase  indefinitely  and 
is  in  fact  bounded  by  a parameter  MMAX.  Various  constraint  dropping  strategies 


can  be  employed  in  deciding  which  cuts  wiil  be  included  in  the  symmetric  positive 
definite  system.  A more  aggressive  .strategy  of  keeping  MMAX  very  small,  say  0(1), 
is  discussed  in  the  next  subsection. 


If  MMAX  is  tsken  to  be  0<n)  then  null  steps  will  lake  O(n')  flops  while  serious 
steps  will  take  O(n^)  flops.  If  fUDCtion  and  subgradient  evaluation  is  cheap,  it  may 
then  be  desirable  to  do  more  null  steps  with  the  hope  of  reduciog  the  number  of  serious 
steps.  However,  we  do  ooL  kuow  ifincrenslng  null  steps  will  in  fact  decrease  the  overall 


oven  be  possible  that  the  line  search  variation  of  Theorem  7 (for  the  smooth  case,  no 
null  steps  are  necessary;  for  the  nonsmootb  case,  null  steps  will  bn  needed  only  if  the 
direetbn  docs  not  give  sufEcieot  decrease— which  would  be  rare  since  the  function 
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woul<i  be  differeotinble  almost  everywhere)  may  give  fewer  iteratiooB  at  the  expense 
of  more  function  and  subgradient  evaluations.  This  interaction  of  the  number  of  null 

future  research  as  we  have  so  far  only  tested  the  variation  which  uses  “central  cuts" 

direction  finding  at  each  iteration.  This  can  be  done  by  fixing  s,  at  every  iteration 
aud  making  it  independent  of  the  iinoariution  error  p,.  Note  that  the  only  condition 
required  rrf  s,  for  purposes  of  global  convergertcc  is  the  boundedness  condition 

Si  > V > 0,  Vi. 

However,  for  the  purposes  of  satisfying  the  motivation  of  genemting  directions  of 
descent  with  respect  to  nearby  subgradients,  we  still  need  Si  to  be  small  for  points 
close  to  the  current  iterate.  This  can  be  done,  say.  by  making  a,  a function  of  the 
iteration  counter  k,  e.g.,  makiog  ss  inversely  proportional  to  k.  Id  so  doing,  the 
matrix  A^S~^A  can  be  refactored  by  using  a rank-one  update  each  time  eince  all 
past  “slacks'  will  slay  at  thdr  original  values. 

4.5.2  Memory 

Id  the  convex  case,  the  algorithm  requires  Ibe  storage  of  ms  function  values  and 
subgradienls  used  in  the  matrix  For  the  variation  which  uses  central  cutting 

plane  techniques,  ms  has  to  beat  least  0(r]  to  ensure  the  presence  of  enough  cuts  that 
approximate  the  level  set.  If  memory  (or  the  lack  thereof)  is  a prime  consideration, 
then  the  line  search  variant  can  be  used  and  ms  can  be  kept  fairly  small,  say  3 or  4 in 
tbeexlrenie.  If/r  = 0 then  the  scheme  reduces  to  a limited-niemory  conjugate  gradient 
variaul.  The  convergence  for  the  smooth  case  still  follows  although  one  would  expect 
more  iterations  for  smaller  values  of  ms*  Implementation  of  this  limited'memory 
scheme  is  one  area  of  future  research.  For  our  implementation,  we  keep  all  the  cuts. 


Some  of  the  test  problems  thst  we  use  iu  the  sectioa  on  eomputetionel  results 
require  bounds  on  the  vnriables.  While  constrsints  in  general  can  be  handled  in  a 
Donsmooth  fashion  using  penalty  fiinctiocs,  we  use  instead  a vtknation  based  on  the 
dual  alhne  scaling  ^oritbra  for  linear  programming.  Note  that  for  the  linear  program 

Min  iz 
s.t.  f<i<u 

with  only  lower  and  upper  berunds,  the  afliuc  scaling  direction  is  ^ven  by 

d=-(Sf’+vr‘' 

where  5i  =diag(i  - I)  and  St  =diag(ti  - 1).  From  Theorem  4.  this  direction  is  a 
positive  scale  of  tbe  solution  to  the  constrained  WLS  system 

Mil'  JiKsr'+sr'wii’ 

s.t.  e^d=-l. 

Recall  now  tbe  direction  hnding  subproblem 

(DFS)  MIN  i;-M||’+|||5-'(/ld4e)||’ 
s.t.  /'(**)^d  = -l. 

Note  tbal  the  component  Jp  ||  d||’  in  the  objective  function  of  (BF'5)  can  in  fact  be 
induced  by  the  bound  constraints 

Hence,  a oalural  cxteitsion  for  problems  with  bound  coiutriuDts  I < s < « is  to 
replace  ||  d ||’  with  | ||  (S,*'  + 5f')d  ||’.  The  resulting  search  direction  will  then 
have  an  afhne  scaling  component  with  respect  to  tbe  bound  constraints  I < i < u. 


Wl‘  do  oot  yet  bavc  a proof  of  convergence  for  thfs  variation.  We  uee  it  primarily 
for  ease  of  ifnplefnentalion  as  it  requires  only  the  addition  of  numbers  to  the  diagonal 
eienients  of  A'^S~'^A.  The  ratio  teat  in  step  2 of  ASCBM  must  of  course  be  changed 


We  report  beiow  results  on  the  performance  of  llie  above  algorithm  with  respect 
to  other  algorithms  for  noadilferentiablc  optimisation.  We  use  as  beochmarlts  test 


tile  programs  arc  written  in  FOItTRAN  77  using  double  precision  and  run  on  an 
IBM  F.S/9000-740  computer  with  3 vector  faHiities  aud  bl2M  of  core  memory.  The 


0PT{-7)  and  VECTOR.  The  following  is  a brief  description  of  the  algorithms  tested 
and  some  implementation  notes.  lAill  details  rigardiog  the  algori  tlims  can  be  obtained 
from  the  cited  references. 

1 lASCBAf)  Afliiie  scaling  will)  cmlcring  bundle  metljod.  Tile  LINPACK  rou- 
tines DPOFA,  DC’HL’D,  and  DPOSL  were  used  to  compute  Cholesky  factors, 
iretform  rank*one  updates  on  the  Chotesky  factors,  and  solve  the  linear  systems 
respectively.  The  Basic  Linear  Algebra  Subprograms  (BLAS)  were  obtained 
from  the  IBM  Enginceriog  and  Scientific  Subroutine  Library  (ESSL). 

2.  (SOAS)  Version  3 of  Iho  implementation  of  Kiwiel's  (25.  26]  prosimal  bundle 
method.  The  code  uses  adaptations  of  LINPACK  routines  in  its  factoriaations. 
BLA.S  routines  were  called  from  llie  ESSL  library. 

3.  (BT  and  BTCLO)  (Jnconstrained  and  linearly  constrained  versions  of  Schramm 
and  Zowe's  [13]  bundle  trust  region  method,  implementation  details  arc  de- 
scribed in  Oiilrata  el  aj.  [42]. 


prohlenis  that  liave  been  used  in  the  literature  on 


table  optimisation.  Ail 


programs  were  compiled  using  the  IBM  FOfCTRAN  VS  compiler  with  default  option 


4.  {GLPi  Cener&li2f*d  linear  progranmiins.  Column  generation  vereinn  of  Kelley’a 
[33]  cutting  plane  method. 

The  benchmark  problems  can  be  divided  into  three  groups  a)  three  small  classi- 
cal academic  test  problems  (Lcmarnchal  [35]  and  Sbor  (4S|);  b)  the  Gofhn- Drrtsekas 
nonlinear  muiticommodity  flow  problems  (GoUin  [11]  and  Gafoi  and  Bcrtsckas  [8]}; 
and  c)  duals  of  randomly  generated  linear  and  quadratic  integer  programs  [Hearn 
and  Lawphcogpanich  [18].  Details  on  the  test  problems  can  be  found  in  the  cited 
references.  We  give  bdow  a sketch  of  the  problems  and  the  computational  results. 

Before  we  give  the  results,  it  should  be  noted  that  the  moat  widely  accepted  mea- 
sure of  performance  for  nonsmooth  optimiaalion  is  the  number  of  function  evaluations 
to  reach  a prescribed  precision.  A secondary  albeit  potentially  misleading  measure 
is  the  CPU  time  taken  by  the  programs.  Even  when  programs  are  run  on  the  same 
machine,  as  was  done  in  this  case,  we  must  point  out  that  one  must  be  careful  io  io- 
terpreiing  results  based  on  CPU  time.  This  in  exempliSnd  by  the  dramatic  dilFerence 
(time-wise)  between  the  bundle  code  of  Kiwiel  on  one  hand  and  Schramm  and  Zosve 
on  the  other.  These  two  methods  have  very  similar  direction  Undiogsubproblems  but 
their  implementations  are  radically  different.  On  one  of  the  test  sets,  one  will  see  a 
hundred  fold  difference  in  CPU  limea  when  the  function  evaluation  counts  dilfer  only 
by  a factor  less  than  twa 

ISLL  Shor's  miuimax  problem  SHOR  [45],  p.  138,  N = 5-  Problem  is  uncon- 
strained. Optimal  objective  value  = 22.COOIG. 

Iraki  Lemarechal’s  minimal  problem  MXQUAD  ]35].  p.  151,  N = 10.  Prob- 
lem is  unconstrained.  Best  known  objective  value  - — 0.84I4QS, 

Tuft  3,  Lemaredial’s  polyhedral  problem  TR48  [35],  N - 48-  Problem  is  un- 
constrained. Optimal  objective  value  - -538,565. 


Read;,  for  problem  SHOR  (N=5) 


Method 

Reset'  Fnc.  Kval 

Iterations  f(i") 

CPU-sera. 

ASCBM 

NOA3 

BT 

10  37 

10  70 

44/29'  22.600162096 
22  / 14'  22.600162097 
62  22.6110162009 

0.030 

0-543 

Ihble  4X  Results  for 

oroblem  MXOIMD  (N  = 

Method 

Reset  Fnc.  Eval 

CPUsecs. 

c.a.  83 

41/42“  -0.84H0832I5 

0.098 

20  52 

28  / 23*  -0.8414083299 

0.076 

20  56 

45  0.8414082496 

0.047 

4-  Gollin's  22  arc,  high  traffic  inlenaity  problem  GB22h  [11],  N = 22,  This 
solution  has  strictly  positive  variables.  Optimal  objective  value  = •103,41202  with 

ISH  j-  Goran's  H8  arc  problem  GBHS  [JJ],  X = 148.  This  is  a dual  problem 
where  the  variables  have  nomiegalivity  restrictions.  The  optimal  solution  has  strictly 


Talile  4.4.  ResiiHa  tor  problem  (!B22fa  (S  = 221 

Mclhod  Rrwt  Poc.  llculioiu  JJ7’)  CPU-ifo. 

103.41201999  0.404 

103.41130138  4.751 

103.41130565  17.271 

103.40836426  15.728 

103.41201475  .50.221 

103.41201993  7.192 


A8LBM  n.a. 

N’0A3  44 

N0A3  220 
BTCLC  44 
BTCLC  220 
GLP  n.a. 


T^ble4.5.  tUaulU  for  problem  CB148  (N  = 148) 


Method  Heart  Fnc.  Evil 


/(r*) 


176  ! 691*  -151.92687073 
128  / 757“  -151.92687078 
117  / 666’  -151.92687076 
1137  -151.92687078 

1010  -151.92687078 

ivot.) 


He»m  and  Lawphongpanich'a  polyhedral  UPS  118).  N = 50.  Remits 
are  avcragea  for  5 randomly  generated  problems.  The»e  are  randomly  generated 

nonoegative  dual  variables.  The  reenlta  are  average,  over  5 random  test  problems. 
RELERR  is  error  relative  to  the  best  objective  obtained  by  ASCBM; 

= !MSCBU)-HMtthod) 

|/(/4Sf’BM)|4-l 


implies  that  ASCBM 
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mie  46.  Results  for  problem  LIPS  (N  = 50) 

Method  Reset  Fnc.  Eval  Itetationa  Rderr  CPU-aecs. 
ASCBM  o.a.  76.2  47.'l  / 28.8‘  — 0.399 


NOA3  50  66.2  33.0  / 32.2*  +3.8e-07  0.300 

BTCLC  50  80.2  70.0  +3.8e-07  3.166 

GLP  D.a.  1*8,2  187.2/  1309.4*  +7.2e-08  2.290 


Table  4.7.  Resulu  for  problem  LIPS  (N  = 1501 

Method  Beset  Fne.  Eval  lletationi  Releer  CPU-sws. 

ASCBM  n.a.  205.0  86,2  / 1 18.8’  - 7.01 

N0A3  150  1*5.2  .5.3.0  / 131.2*  +3.6e-09  5.55 

BTCLC  150  259.2  222.4  +4.0e-09  217.28 

GLP  (excessive  CPU  time| 


Table  4.8.  Resullt  tor  problem  LIPS  IN  = 300) 

Method  Reset  Fnc.  Fval  Iterations  ’Reien”~i 


ASCBM  n.a. 
N0A3  150 
N0A3  300 
BTCLC  150 
BTCLC  300 


122.0  / 229.0"  - 

75.0  / 384,8'  -1.7e-07 

75.0  / 302.4"  +6.4e-09 

571-0  -1.6C-05 

502.8  -l.Qe-06 


49.6 

41.0 

56.2 

2224.9 

2563.1 


TesK  9-10,  lleani  and  Lawpbongpanich's  [IS]  duals  of  quadratic  integer  pro- 
grams (QIPS)  . The  algorithm,  are  tested  for  N = 150  and  300  nonneg.tive  dual 
variables  most  of  which  are  aero  at  the  optimal  solution.  The  result,  below  are  aver- 
ages for  5 randomly  generated  problems. 


Table  4^9.  Results  for  problem  QIPS  (N  = 150) 


Method 

Renet 

Fnc.  EvaJ 

IteratioDS 

Relerr 

CPU-aeci. 

ASCBM 

NOA3 

BTCLC 

ISO 

217.2 

94.0 

110.8  / 10S.4‘ 
35.2  / 57.8* 

-3.4e-09 

+2.1e-09 

1.11 

100.8S 

‘Se,»u.«.p./Null  step. 

Table  4.10.  Results  for  oroblem  niPS  IN  - 3nni 

Method 

Reset 

Fnc.  Eval 

Iterations 

Relerr 

CPU-aecn, 

NOA3 

BTCLC 

ISO 

187.0 

252.0 

152.6  / 213.1' 
48.0  / 138.0“ 
242.0 

+3.7e-09 

00.729 

7.269 

798.908 

The  results  are  quite  mixed  but  there  are  a few  strilr* 


log  observations  that  cau  be  luatle! 


I.  The  ASCBM  program  turned  out  to  be  robust  and  was  able  to  solve  all  test 
problems  within  reasonable  effort.  On  the  other  hand,  the  two  other  bundle 
nodes  bad  a dlfhoult  time  with  tlie  Ill-conditioned  multicommodity  6ow  problem 
GB22h. 


3.  For  the  LIPS  problenu,  ASCBM  tends  to  get  better  as  the  problem  siae  grows. 

4.  NOA3  performed  beat  on  the  QIPS  problems  followed  by  BTCLC  (though  cer- 


tainly not  time- wise).  There  seems  to  be  no  difference  between  theporforn 


of  ASCBM  on  the  LIPS  and  the  QIPS  problems. 

5.  NOA3  seem  to  have  the  most  efBcient  implementation  in  terms  of  time  per 
Iteration  followed  by  ASCBM  (except  for  the  very  small  N=5,I0  problems  in 
which  somehow  the  ASCBM  iterations  were  more  efficient).  BT/BTCLC  seems 
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to  have  been  poorly  implemeiued  coiisidetiDg  that  itsdirectioo  fiodjog  Bubprob' 
lein  is  very  similar  to  that  of  N0A3’b. 


Looking  back,  the  weighted  least  squares  techniques  bchtod  ASCBM  were  moti, 
voted  by  the  goal  of  avoiding  the  problem  of  sigsags  posed  by  ilhcoodilioned  or  stiff 

GBldS  can  be  treated  as  unconstrained  problems).  This  indicates  that  weighted  least 


squares  teclioiques  provide  a promising  approach  to  achieving  these  motivating  goals. 


CHAPTER  5 

SUMMARY  AND  EXTENSIONS  KOR  FURTHER  RESEARCH 


and  balancing  often  conSirting  rcqtiirnnente<  imposed  on  an  acceptable  scarnb  dj* 
reclion.  These  requirements  arc  often  quantifiable  using  inner  product  relationships 
between  the  search  direction  and  other  vectors  specific  to  a given  problem.  &y  giving 
quantified  weights  on  liow  strong  these  relationships  should  be  satisfied,  we  arrive  at 


for  given  vectors  p.  and  index  seta  /*t,  and 

In  this  dissertation,  we  Wftre  able  to  specify  these  inner  product  relationships 
and  their  corresponding  weights  for  the  problems  of  analytic  centeriog,  interior  point 
linear  programming,  and  convex  nondilTerenliable  optiminalioo. 

Starting  with  a wnghted  least  squares  (WLSj  analysis  of  the  boundary  behavior 
of  the  Newton  dircctioo  for  analytic  centering,  we  came  up  with  a modified-Newton 
centering  direction  with  better  boundary  behaviour.  Next  a WLS  analysis  of  the 
boundary  behavior  of  the  dual  affine  scaling  aigoritiim  revealed  the  need  for  a cen- 
tering  component  to  mahe  the  algorithm  robust.  Computational  results  showed  that 
using  the  modified  Newton  direction  in  lieu  of  the  traditional  and  commonly  used 
Newton  centering  directiem  as  the  centering  component  leads  to  a more  robust  inte- 
rior point  algorithm  for  linear  prergramming. 


w.{gTd  = 

ic,(p?d  = 0).ie/“ 

’‘.{sl‘1  = -i).ier 


(5.1) 

(5.2) 

(5.3) 
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The  afliiie  ecallng  and  modihcd  Newton  direction  came  up  again  in  tandem  after 
a WLS  formulation  of  the  problem  of  trying  to  obtain  descent  with  respect  to  nearby 
subgradietits  in  the  minimizatton  of  a nonsmootb  function.  Eleniente  from  trust  re- 

with  (modibed  Newton)  centering  bundle  method  (ASCBM).  Preliminary  computa- 

tiable  optimization. 

grarnming  baa  sbowu  promise,  much  work  etill  need  to  be  done.  The  following  areas 
of  research  can  be  explored  further. 


We  have  clearly  seen  the  advantages  of  using  the  rnodibed-Nowlon  centering  di- 


question remains  as  to  what  is  the  best  strategy  for  combining  the  afline  scaling  and 
centering  directions.  A better  strategy  than  the  one  implemented  above  may  be  a 
predictor-corrector  scheme  (see  for  example  Tbdd  [51. 52j)  where  a cenleriog  step  is 
taken  only  when  necessary.  This  could  prevent  the  situation  where  the  lullial  cen- 
tering steps  where  not  needed  and  resulted  only  in  extra  iterations.  Furthermore, 
the  algorithms  Implemented  here  centered  only  during  the  ioiti^  staga  when  it  is 
conceivable  that  the  long  steps  taken  could  later  lead  to  a “bad"  Iterate  which  may 
require  additioual  centering.  In  this  case,  a predictor-corrector  approach  would  be 
appropriate.  The  difficulty  in  implementing  a predictor-corrector  scheme  may  be  in 
delerminiog  a reliable  correcting  criterion  that  would  delect  the  Deed  for  taking  a 

The  analyses  in  this  paper  were  restricted  to  pure  dual  methods  that  work  exclu- 
sively on  the  dual  space  (LD).  As  there  is  an  equivalence  that  can  be  marie  betweeo  the 
primal  afline  scaling  and  the  dual  afhnc  scaling  algorillims  (see  for  example  Monma 


e.  The 


and  Morion  [40]),  a aeparala  onalyais  for  pure  primal  methods  may  be  superfluous. 
A separate  analysis  is  needed  for  primal-dual  methods  (for  example  Kojima  et  al. 
[27]  and  the  OBI  projert  by  Lustig  et  al.  [37])  which  work  oo  the  primal  and  dual 


A promising  extension  is  on  solving  linear  programs  from  warm  starts  (i«.,  restart- 
ing from  a prior  optimal  solotion  to  siflvea  perturbed  problem)  as  such  situations  are 
often  difEcult  for  traditional  interior  point  methods.  In  conjunction  with  the  tech- 
niques for  convex  nondilfereotiable  optimization,  the  area  of  decomposition  of  linear 
programs  via  an  interior  point  melbod  can  be  explored. 


More  experimental  results  are  needed  to  determine  classes  of  problems  where 
ASCBM  might  prove  efScient.  As  the  method  seems  to  work  well  on  ill-conditioned 
problems,  a good  start  would  be  to  tackle  ill-conditioned  nKwork  problems  in  the 
literature.  Another  set  of  problems  of  interest  would  be  those  for  which  fuoction 
evaluation  is  difficult,  say  greater  than  the  0(n^)  required  by  the  direction  flading 
subproblem  during  serious  steps.  Methods  for  computing  the  direction  moreefficientiy 
should  be  explored.  One  approach  would  be  to  study  in  detui  possible  extensions 
that  exploit  the  algorithm's  connection  to  conjugate  gradients.  As  the  algorithm 
is  convergent  for  smooth  problems,  it  should  hi:  tested  and  compared  especially  on 
smooth  problems  that  have  proved  to  be  diflicult. 

The  lack  of  a convergence  proof  fur  thit  nonsmooth  case  Is  something  that  Dueds 
immediate  attention.  One  approach  is  to  study  more  closely  tile  algorithm's  links  to 
affine  scaling  and  centering  on  the  level  set  approximations,  paying  close  attention 
particularly  to  duality  ndationshipa. 


gj^  = —Ki,  to  moke  the  method  belter  scolcd.  A suitable  rea  is  necessary  if  we  want 
to  make  superlincac  convergence  as  a goa]  that  the  resulting  aigorithm  should  achieve. 


Interior  point  methods  have  not  been  as  successful  when  applied  to  nonlinear  pro- 

Hessian  near  the  optimum,  is  the  poor  near-boundary  behavior  of  the  Newton  center- 
iog  direction.  In  the  case  of  liocar  programming,  this  direction  tends  to  move  parallel 

conslraints  would  make  the  Newton  centering  direction  tend  to  go  outside  the  feasible 
repon.  As  with  wbat  we  did  for  the  LP  case,  we  could  then  try  to  use  the  modified 
Newton  direction  instead,  since  its  tendency  would  be  to  move  inside  the  feasible  set. 

Finally,  it  may  be  worthwhile  to  analyse  existing  directions  by  formulatiog  them  as 
WLS  solutions  in  a manner  similar  to  what  we  did  for  theafhne  scaling  and  centering 
directions.  This  may  give  us  a belter  understanding  of  their  properties  and  perhaps 
enable  us  to  reverse-engineer  other  directions  with  more  desirable  properties. 
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